In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import statsmodels.api as sm
import numpy as np

Q1 (10 points)

The heart dataframe contains the survival time after receiving a heart transplant, the age of the patient and whether or not the survival time was censored

  • Number of Observations - 69
  • Number of Variables - 3

Variable name definitions::

  • survival - Days after surgery until death
  • censors - indicates if an observation is censored. 1 is uncensored
  • age - age at the time of surgery

Answer the following questions with respect to the heart data set:

  • Sort the data frame by age in descending order (oldest at top) without making a copy
  • How many patients were censored?
  • What is the average age for uncensored patients under the age of 45?
  • Find the mean and standard deviation of age and survival time for each value of the cenoring variable.
  • Plot the linear regression of survival (y-axis) against age (x-axis) conditioned on censoring (i.e. either have two separate plots or a single plot using color to distinguish censored and uncensored patients).

In [2]:
heart = sm.datasets.heart.load_pandas().data
heart.head(n=6)


Out[2]:
survival censors age
0 15.0 1.0 54.3
1 3.0 1.0 40.4
2 624.0 1.0 51.0
3 46.0 1.0 42.5
4 127.0 1.0 48.0
5 64.0 1.0 54.6

In [3]:
heart.sort_values('age', ascending=False, inplace=True)
heart.head()


Out[3]:
survival censors age
17 60.0 1.0 64.5
21 47.0 1.0 61.5
12 730.0 1.0 58.4
8 23.0 1.0 56.9
47 63.0 1.0 56.4

In [4]:
sum(heart.censors == 0)


Out[4]:
24

In [5]:
heart.loc[(heart.censors == 1) & (heart.age < 45), 'age'].mean()


Out[5]:
38.418181818181814

In [6]:
heart.groupby(['censors']).agg(['mean', 'std'])


Out[6]:
survival age
mean std mean std
censors
0.0 664.625000 526.214604 41.729167 9.177026
1.0 223.288889 332.657182 48.484444 7.840163

In [7]:
import seaborn as sns

sns.lmplot(data=heart, x='age', y='survival', hue='censors')
pass


Q2 (10 points)

Write a flatmap function that works like map except that the function given takes a list and returns a list of lists that is then flattened (4 points).

In other words, flatmap takes two arguments, a function and a list (or other iterable), just like map. Howevver the function given as the first agument takes a single argument and returns a list (or ohter iterable). In order to get a simple list back, we need to unravel the reuslting list of lists, hence the flatten part.

For example,

flatmap(lambda x: x.split(), ["hello world", "the quick dog"])

should return

["hello", "world", "the", "quick", "dog"]

In [8]:
def flatten(list_of_lists):
    """Flatten a list of lists."""
    for alist in list_of_lists:
        for item in alist:
            yield item

def flatmap(f, xs):
    """First map, then flatten result."""
    return flatten(map(f, xs))

In [9]:
list(flatmap(lambda x: x.split(), ["hello world", "the quick dog"]))


Out[9]:
['hello', 'world', 'the', 'quick', 'dog']

Q3 (10 points)

An affine transformation of a vector $x$ is the operation $Ax + b$, where $A$ is a matrix and $b$ is a vector.

  • Write a function to perform an affine transformation.
  • Write a function to reverse the affine transformation
  • Perform an affine transformation of a random 3 by 3 matrix A, and random 3-vectors $x$ and $b$ drawn from the standard uniform distribution with random seed = 1234 and save the result as $y$. Perform the reverse affine transform on $y$ to recover the original vector $x$.

In [10]:
def affine(x, A, b):
    """Affine transofrm."""
    return A@x + b

In [11]:
def rev_affine(y, A, b):
    """Reverse affine transform."""
    return np.linalg.solve(A, (y - b))

In [12]:
A = np.random.random((3,3,))
b = np.random.random(3)
x = np.random.random(3)

In [13]:
x


Out[13]:
array([ 0.91397072,  0.44084231,  0.73953615])

In [14]:
y = affine(x, A, b)
y


Out[14]:
array([ 1.61972625,  1.53725841,  1.14072047])

In [15]:
rev_affine(y, A, b)


Out[15]:
array([ 0.91397072,  0.44084231,  0.73953615])

Q4 (10 points)

You are given the following DNA sequecne in FASTA format.

dna = '''> A simulated DNA sequence.
TTAGGCAGTAACCCCGCGATAGGTAGAGCACGCAATCGTCAAGGCGTGCGGTAGGGCTTCCGTGTCTTACCCAAAGAAAC
GACGTAACGTTCCCCGGGCGGTTAAACCAAATCCACTTCACCAACGGCATAACGCGAAGCCCAAACTAAATCGCGCTCGA
GCGGACGCACATTCGCTAGGCTGTGTAGGGGCAGTCTCCGTTAAGGACGATTACCACGTGATGGTAGTTCGCAACATTGG
ACTGTCGGGAATTCCCGAAGGCACTTAAGCGGAGTCTTAGCGTACAGTAACGCAGTCCCGCGTGAACGACTGACAGATGA
'''
  • Remove the comment line and combine the 4 lines of nucleotide symbols into a single string
  • Count the frequecny of all 16 two-letter combinations in the string.

In [16]:
dna = '''> A simulated DNA sequence.
TTAGGCAGTAACCCCGCGATAGGTAGAGCACGCAATCGTCAAGGCGTGCGGTAGGGCTTCCGTGTCTTACCCAAAGAAAC
GACGTAACGTTCCCCGGGCGGTTAAACCAAATCCACTTCACCAACGGCATAACGCGAAGCCCAAACTAAATCGCGCTCGA
GCGGACGCACATTCGCTAGGCTGTGTAGGGGCAGTCTCCGTTAAGGACGATTACCACGTGATGGTAGTTCGCAACATTGG
ACTGTCGGGAATTCCCGAAGGCACTTAAGCGGAGTCTTAGCGTACAGTAACGCAGTCCCGCGTGAACGACTGACAGATGA
'''
dna = ''.join(dna.split('\n')[1:])
dna


Out[16]:
'TTAGGCAGTAACCCCGCGATAGGTAGAGCACGCAATCGTCAAGGCGTGCGGTAGGGCTTCCGTGTCTTACCCAAAGAAACGACGTAACGTTCCCCGGGCGGTTAAACCAAATCCACTTCACCAACGGCATAACGCGAAGCCCAAACTAAATCGCGCTCGAGCGGACGCACATTCGCTAGGCTGTGTAGGGGCAGTCTCCGTTAAGGACGATTACCACGTGATGGTAGTTCGCAACATTGGACTGTCGGGAATTCCCGAAGGCACTTAAGCGGAGTCTTAGCGTACAGTAACGCAGTCCCGCGTGAACGACTGACAGATGA'

In [17]:
from collections import Counter

c = Counter(zip(dna[:-1], dna[1:]))
c


Out[17]:
Counter({('A', 'A'): 26,
         ('A', 'C'): 25,
         ('A', 'G'): 22,
         ('A', 'T'): 11,
         ('C', 'A'): 21,
         ('C', 'C'): 20,
         ('C', 'G'): 33,
         ('C', 'T'): 12,
         ('G', 'A'): 19,
         ('G', 'C'): 24,
         ('G', 'G'): 22,
         ('G', 'T'): 23,
         ('T', 'A'): 19,
         ('T', 'C'): 17,
         ('T', 'G'): 11,
         ('T', 'T'): 14})

In [18]:
sum(c.values())


Out[18]:
319

Q5 (10 points)

The code given below performs a stochastic gradient descent to fit a quadratic polynomila to $n$ data points. Maake the code run faster by:

  • Using numba JIT
  • using Cython

Some test code is provided. Please run this for your optimized versios to confirm that they give the same resuls as the orignal code.


In [19]:
def sgd(b, x, y, max_iter, alpha):
    n = x.shape[0]
    for i in range(max_iter):
        for j in range(n):
            b[0] -= alpha * (2*(b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[1] =- alpha * (2*x[j] * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[2] -= alpha * (2*x[j]**2 * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
    return b

In [20]:
%%time
np.random.seed(12345)
n = 10000
x = np.linspace(0, 10, n)
y = 2*x**2 + 6*x + 3 + np.random.normal(0, 5, n)
k = 100
alpha = 0.00001

b0 = np.random.random(3) 
b = sgd(b0, x, y, k, alpha)

yhat = b[0] + b[1]*x+ b[2]*x**2
idx = sorted(np.random.choice(n, 100))
plt.scatter(x[idx], y[idx])
plt.plot(x[idx], yhat[idx], c='red')
pass


CPU times: user 7.88 s, sys: 8.68 ms, total: 7.89 s
Wall time: 7.89 s

In [21]:
from numba import jit

@jit(nopython = True)
def sgd_numba(b, x, y, max_iter, alpha):
    n = x.shape[0]
    for i in range(max_iter):
        for j in range(n):
            b[0] -= alpha * (2*(b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[1] =- alpha * (2*x[j] * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[2] -= alpha * (2*x[j]**2 * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
    return b

In [22]:
# Call once to trigger compilation
sgd_numba(b0, x, y, k, alpha);

In [23]:
%%time
np.random.seed(12345)
n = 10000
x = np.linspace(0, 10, n)
y = 2*x**2 + 6*x + 3 + np.random.normal(0, 5, n)
k = 100
alpha = 0.00001

b0 = np.random.random(3) 
b = sgd_numba(b0, x, y, k, alpha)

yhat = b[0] + b[1]*x+ b[2]*x**2
idx = sorted(np.random.choice(n, 100))
plt.scatter(x[idx], y[idx])
plt.plot(x[idx], yhat[idx], c='red')
pass


CPU times: user 105 ms, sys: 3.06 ms, total: 108 ms
Wall time: 106 ms

In [24]:
%load_ext cython

In [25]:
%%cython -a
cimport cython

@cython.wraparound(False)
@cython.boundscheck(False)
def sgd_cython(double[:] b, double[:] x, double[:] y, int max_iter, double alpha):
    cdef int n = x.shape[0]
    cdef int i, j
    
    for i in range(max_iter):
        for j in range(n):
            b[0] -= alpha * (2*(b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[1] =- alpha * (2*x[j] * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
            b[2] -= alpha * (2*x[j]**2 * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
    return b


Out[25]:
Cython: _cython_magic_a400bf20ba756fb3f523f534b55b11f1.pyx

Generated by Cython 0.25.2

Yellow lines hint at Python interaction.
Click on a line that starts with a "+" to see the C code that Cython generated for it.

+01: cimport cython
  __pyx_t_1 = PyDict_New(); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_test, __pyx_t_1) < 0) __PYX_ERR(0, 1, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
 02: 
 03: @cython.wraparound(False)
 04: @cython.boundscheck(False)
+05: def sgd_cython(double[:] b, double[:] x, double[:] y, int max_iter, double alpha):
/* Python wrapper */
static PyObject *__pyx_pw_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds); /*proto*/
static PyMethodDef __pyx_mdef_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython = {"sgd_cython", (PyCFunction)__pyx_pw_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython, METH_VARARGS|METH_KEYWORDS, 0};
static PyObject *__pyx_pw_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython(PyObject *__pyx_self, PyObject *__pyx_args, PyObject *__pyx_kwds) {
  __Pyx_memviewslice __pyx_v_b = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_x = { 0, 0, { 0 }, { 0 }, { 0 } };
  __Pyx_memviewslice __pyx_v_y = { 0, 0, { 0 }, { 0 }, { 0 } };
  int __pyx_v_max_iter;
  double __pyx_v_alpha;
  PyObject *__pyx_r = 0;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("sgd_cython (wrapper)", 0);
  {
    static PyObject **__pyx_pyargnames[] = {&__pyx_n_s_b,&__pyx_n_s_x,&__pyx_n_s_y,&__pyx_n_s_max_iter,&__pyx_n_s_alpha,0};
    PyObject* values[5] = {0,0,0,0,0};
    if (unlikely(__pyx_kwds)) {
      Py_ssize_t kw_args;
      const Py_ssize_t pos_args = PyTuple_GET_SIZE(__pyx_args);
      switch (pos_args) {
        case  5: values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
        case  4: values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
        case  3: values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
        case  2: values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
        case  1: values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
        case  0: break;
        default: goto __pyx_L5_argtuple_error;
      }
      kw_args = PyDict_Size(__pyx_kwds);
      switch (pos_args) {
        case  0:
        if (likely((values[0] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_b)) != 0)) kw_args--;
        else goto __pyx_L5_argtuple_error;
        case  1:
        if (likely((values[1] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_x)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, 1); __PYX_ERR(0, 5, __pyx_L3_error)
        }
        case  2:
        if (likely((values[2] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_y)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, 2); __PYX_ERR(0, 5, __pyx_L3_error)
        }
        case  3:
        if (likely((values[3] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_max_iter)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, 3); __PYX_ERR(0, 5, __pyx_L3_error)
        }
        case  4:
        if (likely((values[4] = PyDict_GetItem(__pyx_kwds, __pyx_n_s_alpha)) != 0)) kw_args--;
        else {
          __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, 4); __PYX_ERR(0, 5, __pyx_L3_error)
        }
      }
      if (unlikely(kw_args > 0)) {
        if (unlikely(__Pyx_ParseOptionalKeywords(__pyx_kwds, __pyx_pyargnames, 0, values, pos_args, "sgd_cython") < 0)) __PYX_ERR(0, 5, __pyx_L3_error)
      }
    } else if (PyTuple_GET_SIZE(__pyx_args) != 5) {
      goto __pyx_L5_argtuple_error;
    } else {
      values[0] = PyTuple_GET_ITEM(__pyx_args, 0);
      values[1] = PyTuple_GET_ITEM(__pyx_args, 1);
      values[2] = PyTuple_GET_ITEM(__pyx_args, 2);
      values[3] = PyTuple_GET_ITEM(__pyx_args, 3);
      values[4] = PyTuple_GET_ITEM(__pyx_args, 4);
    }
    __pyx_v_b = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[0]); if (unlikely(!__pyx_v_b.memview)) __PYX_ERR(0, 5, __pyx_L3_error)
    __pyx_v_x = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[1]); if (unlikely(!__pyx_v_x.memview)) __PYX_ERR(0, 5, __pyx_L3_error)
    __pyx_v_y = __Pyx_PyObject_to_MemoryviewSlice_ds_double(values[2]); if (unlikely(!__pyx_v_y.memview)) __PYX_ERR(0, 5, __pyx_L3_error)
    __pyx_v_max_iter = __Pyx_PyInt_As_int(values[3]); if (unlikely((__pyx_v_max_iter == (int)-1) && PyErr_Occurred())) __PYX_ERR(0, 5, __pyx_L3_error)
    __pyx_v_alpha = __pyx_PyFloat_AsDouble(values[4]); if (unlikely((__pyx_v_alpha == (double)-1) && PyErr_Occurred())) __PYX_ERR(0, 5, __pyx_L3_error)
  }
  goto __pyx_L4_argument_unpacking_done;
  __pyx_L5_argtuple_error:;
  __Pyx_RaiseArgtupleInvalid("sgd_cython", 1, 5, 5, PyTuple_GET_SIZE(__pyx_args)); __PYX_ERR(0, 5, __pyx_L3_error)
  __pyx_L3_error:;
  __Pyx_AddTraceback("_cython_magic_a400bf20ba756fb3f523f534b55b11f1.sgd_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __Pyx_RefNannyFinishContext();
  return NULL;
  __pyx_L4_argument_unpacking_done:;
  __pyx_r = __pyx_pf_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_sgd_cython(__pyx_self, __pyx_v_b, __pyx_v_x, __pyx_v_y, __pyx_v_max_iter, __pyx_v_alpha);

  /* function exit code */
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}

static PyObject *__pyx_pf_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_sgd_cython(CYTHON_UNUSED PyObject *__pyx_self, __Pyx_memviewslice __pyx_v_b, __Pyx_memviewslice __pyx_v_x, __Pyx_memviewslice __pyx_v_y, int __pyx_v_max_iter, double __pyx_v_alpha) {
  int __pyx_v_n;
  CYTHON_UNUSED int __pyx_v_i;
  int __pyx_v_j;
  PyObject *__pyx_r = NULL;
  __Pyx_RefNannyDeclarations
  __Pyx_RefNannySetupContext("sgd_cython", 0);
/* … */
  /* function exit code */
  __pyx_L1_error:;
  __Pyx_XDECREF(__pyx_t_28);
  __Pyx_AddTraceback("_cython_magic_a400bf20ba756fb3f523f534b55b11f1.sgd_cython", __pyx_clineno, __pyx_lineno, __pyx_filename);
  __pyx_r = NULL;
  __pyx_L0:;
  __PYX_XDEC_MEMVIEW(&__pyx_v_b, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_v_x, 1);
  __PYX_XDEC_MEMVIEW(&__pyx_v_y, 1);
  __Pyx_XGIVEREF(__pyx_r);
  __Pyx_RefNannyFinishContext();
  return __pyx_r;
}
/* … */
  __pyx_tuple__14 = PyTuple_Pack(8, __pyx_n_s_b, __pyx_n_s_x, __pyx_n_s_y, __pyx_n_s_max_iter, __pyx_n_s_alpha, __pyx_n_s_n, __pyx_n_s_i, __pyx_n_s_j); if (unlikely(!__pyx_tuple__14)) __PYX_ERR(0, 5, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_tuple__14);
  __Pyx_GIVEREF(__pyx_tuple__14);
/* … */
  __pyx_t_1 = PyCFunction_NewEx(&__pyx_mdef_46_cython_magic_a400bf20ba756fb3f523f534b55b11f1_1sgd_cython, NULL, __pyx_n_s_cython_magic_a400bf20ba756fb3f5); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 5, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_1);
  if (PyDict_SetItem(__pyx_d, __pyx_n_s_sgd_cython, __pyx_t_1) < 0) __PYX_ERR(0, 5, __pyx_L1_error)
  __Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
  __pyx_codeobj__15 = (PyObject*)__Pyx_PyCode_New(5, 0, 8, 0, 0, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__14, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_Users_cliburn_ipython_cython__c, __pyx_n_s_sgd_cython, 5, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__15)) __PYX_ERR(0, 5, __pyx_L1_error)
+06:     cdef int n = x.shape[0]
  __pyx_v_n = (__pyx_v_x.shape[0]);
 07:     cdef int i, j
 08: 
+09:     for i in range(max_iter):
  __pyx_t_1 = __pyx_v_max_iter;
  for (__pyx_t_2 = 0; __pyx_t_2 < __pyx_t_1; __pyx_t_2+=1) {
    __pyx_v_i = __pyx_t_2;
+10:         for j in range(n):
    __pyx_t_3 = __pyx_v_n;
    for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
      __pyx_v_j = __pyx_t_4;
+11:             b[0] -= alpha * (2*(b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
      __pyx_t_5 = 0;
      __pyx_t_6 = 1;
      __pyx_t_7 = __pyx_v_j;
      __pyx_t_8 = 2;
      __pyx_t_9 = __pyx_v_j;
      __pyx_t_10 = __pyx_v_j;
      __pyx_t_11 = 0;
      *((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_11 * __pyx_v_b.strides[0]) )) -= (__pyx_v_alpha * (2.0 * ((((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_5 * __pyx_v_b.strides[0]) ))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_6 * __pyx_v_b.strides[0]) ))) * (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_7 * __pyx_v_x.strides[0]) ))))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_8 * __pyx_v_b.strides[0]) ))) * pow((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_9 * __pyx_v_x.strides[0]) ))), 2.0))) - (*((double *) ( /* dim=0 */ (__pyx_v_y.data + __pyx_t_10 * __pyx_v_y.strides[0]) ))))));
+12:             b[1] =- alpha * (2*x[j] * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
      __pyx_t_12 = __pyx_v_j;
      __pyx_t_13 = 0;
      __pyx_t_14 = 1;
      __pyx_t_15 = __pyx_v_j;
      __pyx_t_16 = 2;
      __pyx_t_17 = __pyx_v_j;
      __pyx_t_18 = __pyx_v_j;
      __pyx_t_19 = 1;
      *((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_19 * __pyx_v_b.strides[0]) )) = ((-__pyx_v_alpha) * ((2.0 * (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_12 * __pyx_v_x.strides[0]) )))) * ((((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_13 * __pyx_v_b.strides[0]) ))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_14 * __pyx_v_b.strides[0]) ))) * (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_15 * __pyx_v_x.strides[0]) ))))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_16 * __pyx_v_b.strides[0]) ))) * pow((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_17 * __pyx_v_x.strides[0]) ))), 2.0))) - (*((double *) ( /* dim=0 */ (__pyx_v_y.data + __pyx_t_18 * __pyx_v_y.strides[0]) ))))));
+13:             b[2] -= alpha * (2*x[j]**2 * (b[0] + b[1]*x[j] + b[2]*x[j]**2 - y[j]))
      __pyx_t_20 = __pyx_v_j;
      __pyx_t_21 = 0;
      __pyx_t_22 = 1;
      __pyx_t_23 = __pyx_v_j;
      __pyx_t_24 = 2;
      __pyx_t_25 = __pyx_v_j;
      __pyx_t_26 = __pyx_v_j;
      __pyx_t_27 = 2;
      *((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_27 * __pyx_v_b.strides[0]) )) -= (__pyx_v_alpha * ((2.0 * pow((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_20 * __pyx_v_x.strides[0]) ))), 2.0)) * ((((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_21 * __pyx_v_b.strides[0]) ))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_22 * __pyx_v_b.strides[0]) ))) * (*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_23 * __pyx_v_x.strides[0]) ))))) + ((*((double *) ( /* dim=0 */ (__pyx_v_b.data + __pyx_t_24 * __pyx_v_b.strides[0]) ))) * pow((*((double *) ( /* dim=0 */ (__pyx_v_x.data + __pyx_t_25 * __pyx_v_x.strides[0]) ))), 2.0))) - (*((double *) ( /* dim=0 */ (__pyx_v_y.data + __pyx_t_26 * __pyx_v_y.strides[0]) ))))));
    }
  }
+14:     return b
  __Pyx_XDECREF(__pyx_r);
  __pyx_t_28 = __pyx_memoryview_fromslice(__pyx_v_b, 1, (PyObject *(*)(char *)) __pyx_memview_get_double, (int (*)(char *, PyObject *)) __pyx_memview_set_double, 0);; if (unlikely(!__pyx_t_28)) __PYX_ERR(0, 14, __pyx_L1_error)
  __Pyx_GOTREF(__pyx_t_28);
  __pyx_r = __pyx_t_28;
  __pyx_t_28 = 0;
  goto __pyx_L0;

In [26]:
%%time
np.random.seed(12345)
n = 10000
x = np.linspace(0, 10, n)
y = 2*x**2 + 6*x + 3 + np.random.normal(0, 5, n)
k = 100
alpha = 0.00001

b0 = np.random.random(3) 
b = sgd_cython(b0, x, y, k, alpha)

yhat = b[0] + b[1]*x+ b[2]*x**2
idx = sorted(np.random.choice(n, 100))
plt.scatter(x[idx], y[idx])
plt.plot(x[idx], yhat[idx], c='red')
pass


CPU times: user 104 ms, sys: 2.52 ms, total: 106 ms
Wall time: 105 ms

In [ ]: